Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise writebarrier for regions #67389

Merged

Conversation

PeterSolMS
Copy link
Contributor

This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.

The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.

The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).

I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.

I haven't implemented yet committing only the part of the lookup table that is needed.

…g 4 bits for the current and planned generation. WKS shows no overall improvement, SVR crashes.
…l limits where objects in ephemeral regions may be located. We have to do a range check on the child object in mark_through_cards_helper anyway, and using the ephemeral range allows us to skip the table lookup in the cases where a child object cannot possibly be in an ephemeral region.
- accidentally removed setting plan gen num
- need to make the default write barrier larger so we have enough space
- fix copy & paste issue in GetCurrentWriteBarrierCode
…ons would remain stuck. This was because in these cases we would not explore the complete range of card table entries for the card bundle.
 - use BitScanForward in find_card, find_card_dword
 - when we start a new card dword, consult the card bundles first
 - change JIT_ByRefWriteBarrier to consult the region_to_generation_table and set only single bits in the card table.
@ghost
Copy link

ghost commented Mar 31, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.

The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.

The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).

I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.

I haven't implemented yet committing only the part of the lookup table that is needed.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

}
if (ephemeral_change)
{
stomp_write_barrier_ephemeral (ephemeral_low, ephemeral_high,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stomp_write_barrier_ephemeral is required to be called while the EE is suspended. if we are calling this from init_heap_segment, it means it can be called when a new gen0 region is acquired while the EE is running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and when it's called from find_first_valid_region, we could optimize and only call this once at the end of the GC (if the ephemeral range actually got larger).

…f just a single bit. This will allows us to determine the tradeoff between being more precise in the write barrier, which saves work in card marking, and being faster in the write barrier which causes more work in card marking.
@Maoni0
Copy link
Member

Maoni0 commented Jul 7, 2022

running on a 1st party prod workload -

index   Baseline New Diff Diff %
3 Process Duration (Sec) 53,887.63 53,858.88 -28.751 -0.053
4 Total Allocated MB 16,252,981.58 15,819,053.88 -433,927.69 -2.67
5 Max Size Peak MB 26,878.44 27,017.04 138.607 0.516
6 GC Count 7,269.00 7,573.00 304 4.182
7 Heap Count 48 48 0 0
8 Gen0 Count 3,630.00 3,779.00 149 4.105
9 Gen1 Count 3,466.00 3,621.00 155 4.472
10 Ephemeral Count 7,096.00 7,400.00 304 4.284
11 Gen2 Blocking Count 4 4 0 0
12 BGC Count 169 169 0 0
13 Gen0 Total Pause Time MSec 269,123.47 231,960.87 -37,162.60 -13.81
14 Gen1 Total Pause Time MSec 355,468.13 337,579.93 -17,888.20 -5.032
15 Ephemeral Total Pause Time MSec 624,591.60 569,540.80 -55,050.80 -8.814
16 Blocking Gen2 Total Pause Time MSec 4,260.20 2,376.57 -1,883.63 -44.22
17 BGC Total Pause Time MSec 14,630.25 13,270.05 -1,360.20 -9.297
18 GC Pause Time % 1.194 1.087 -0.108 -9.011
19 Avg. Gen0 Pause Time (ms) 74.139 61.382 -12.757 -17.21
20 Avg. Gen1 Pause Time (ms) 102.559 93.228 -9.33 -9.097
21 Avg. Gen0 Promoted (mb) 170.597 166.073 -4.524 -2.652
22 Avg. Gen1 Promoted (mb) 343.586 332.5 -11.087 -3.227
23 Avg. Gen0 Speed (mb/ms) 2.301 2.706 0.405 17.58
24 Avg. Gen1 Speed (mb/ms) 3.35 3.567 0.216 6.458

looking at 500 GCs during steady state as an example -

image

…contain flags to indicate whether a region is sweep-in-plan, and whether it has been demoted.

Generalize the config setting to change the write barrier to allow reverting to the SVR type write barrier as well.
Bug fixes concerning setting the ephemeral limits in the write barrier, and where to compute the ephemeral limits within the GC.
Use the lookup via the map_region_to_generation table in the mark phase as well.
…dy relocated and thus shouldn't be tested against gc_low/gc_high.
…doesn't need to be updated between GCs.

Removed file name argument to _ASSERTE_ALL_BUILDS macro.
… Volatile<T> expands to T volatile.

Use fixed ephemeral bounds for now, but keep more sophisticated code for setting ephemeral_low around.
@AndyAyersMS
Copy link
Member

This improved crossgen2 throughput:
newplot - 2022-09-08T093600 033

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants